Statistical language modeling using a variable context length
نویسنده
چکیده
In this paper we investigate statistical language models with a variable context length. For such models the number of relevant words in a context is not xed as in conventional M gram models but depends on the context itself. We develop a measure for the quality of variable-length models and present a pruning algorithm for the creation of such models, based on this measure. Further we address the question how the use of a special backing-o distribution can improve the language models. Experiments were performed on two data bases, the ARPANAB corpus and the German Verbmobil corpus, respectively. The results show that variable-length models outperform conventional models of the same size. Furthermore it can be seen that if a moderate loss in performance is acceptable, the size of a language model can be reduced drastically by using the presented pruning algorithm.
منابع مشابه
Using Multiple-Variable Matching to Identify EFL Ecological Sources of Differential Item Functioning
Context is a vague notion with numerous building blocks making language test scores inferences quite convoluted. This study has made use of a model of item responding that has striven to theorize the contextual infrastructure of differential item functioning (DIF) research and help specify the sources of DIF. Two steps were taken in this research: first, to identify DIF by gender grouping via l...
متن کاملStrategic Competence and Foreign Language test Performance in Iranian Context
A number of studies have accounted the integral role of foreign/second language learning and learner strategy use. However, a few of these studies have considered the relationships between strategic competence and its use and foreign language performance (FLP). This study applied structural equation modeling to deeply investigate the relationships between test takers’ strategy use and their per...
متن کاملA General MCMC Method for Bayesian Inference in Logic-Based Probabilistic Modeling
We propose a general MCMC method for Bayesian inference in logic-based probabilistic modeling. It covers a broad class of generative models including Bayesian networks and PCFGs. The idea is to generalize an MCMC method for PCFGs to the one for a Turing-complete probabilistic modeling language PRISM in the context of statistical abduction where parse trees are replaced with explanations. We des...
متن کاملStrategic Competence and Foreign Language test Performance in Iranian Context
A number of studies have accounted the integral role of foreign/second language learning and learner strategy use. However, a few of these studies have considered the relationships between strategic competence and its use and foreign language performance (FLP). This study applied structural equation modeling to deeply investigate the relationships between test takers’ strategy use and their per...
متن کاملTone modeling using Gaussian process latent variable model for statistical speech synthesis
In continuous speech of Thai language, tone pronunciation is affected by several factors. One of significant factors is stress that causes a diversity of F0 contours of tone, and affects syllable durations. Our previous studies have shown that a stressed/unstressed syllable context improves tone modeling accuracy. However, the stress in Thai language is generally unknown for a given input text ...
متن کامل